AI - GENERATIVE AI AND LARGE LANGUAGE MODELS

Generative AI and Large Language Models

In recent years, Generative Artificial Intelligence (AI) has revolutionized various fields by enabling machines to create and simulate human-like content. From text and code generation, realistic images and video creation to 3D scene reconstruction , 3D protein structure prediction , generative AI has the potential to reshape industries and redefine the way we interact with technology. At the core of many modern generative AI systems is the backbone of Large Language Models (LLMs), which are capable of understanding and generating human language.

In this article, we will introduce generative AI and its applications, then transition to LLMs, focusing on autoregressive language models. We will explain how they work by examining their autoregressive and attention mechanisms. Lastly, we will discuss adopting LLMs in application development, emphasizing aspects of application content, the data and the model.


What is Generative AI?

Unlike traditional AI models that focus on recognizing patterns or making predictions based on input data, generative AI models learn the underlying distribution of the training data and use it to produce output. These output can take various forms, including text, images, video, music, audio, and even complex data structures like software code, scene scripts and protein structure descriptions.

Through training with a plethora of data and extensive computing time, generative AI models understand the statistical properties and patterns within the training data to produce realistic and coherent output. And with instruction fine-tuning, generative AI can follow human instructions and intentions to create new content and even collaborate with designers and artists in co-creating architectural layouts, paintings, and music compositions.

There are a few types of generative AI models commonly found: Autoregressive Models, Diffusion Models, Generative Adversarial Networks (GANs), and Variational Autoencoders (VAEs).


Common Types of Generative AI Models

Autoregressive Models: These models generate data one step at a time, with each step conditioned by the previous output. They model the probability distribution of a sequence by breaking it down into a product of conditional probabilities. Autoregressive models have shown impressive results in human-like language generation, and examples include OpenAI’s Generative Pre-trained Transformer (GPTs) and Recurrent Neural Networks (RNNs).

Diffusion Models: Diffusion models are used for high-quality images and video synthesis from text prompts. These models learn to generate data by reversing a gradual noising process. During training, noise is incrementally added to the data, and the model learns to reconstruct the original data from the noisy versions. At generation time, the model starts from random noise and iteratively denoises it to produce new data samples. Diffusion models have shown remarkable results in image and audio generation. Examples include SORA, DALL-E 2/3 and Stable Diffusion.

Generative Adversarial Networks (GANs): Consist of two neural networks—the generator and the discriminator—that are trained simultaneously through adversarial processes. The generator creates fake data, while the discriminator evaluates it against real data, pushing the generator to produce increasingly realistic outputs.

Variational Autoencoders (VAEs): Encode input data into a latent space and then decode it to reconstruct the data. VAEs can generate new samples by sampling from the latent space.

Applications of Generative AI

Depending on its output format, generative AI has applications across different domains, mainly including natural language processing, computer vision, and audio synthesis. Here are a few examples:

  • Natural Language Processing (NLP): Large language models, such as GPTs, Llama and Qwen, are used to generate human-like text, making them suitable in applications like chatbots, content creation, and translation services.

(VLMs), or other multi-modal models depending on the type of input data and the intended outputs.

There are several constraints to consider when deploying applications using LLMs, including model size, hallucination, and data privacy. Firstly, the size of large language models ranges from a few million parameters to several trillion. Selecting the appropriate model size requires balancing computational resources with desired performance. Secondly, one of the challenges when using LLMs is their potential to generate hallucinations—incorrect or fabricated information. Existing methods, such as retrieval-augmented generation (RAG) and graph-based LLMs, can mitigate hallucinations to some extent. However, for critical applications, it is essential to have subject matter experts involved in the application loop to validate LLM outputs. Thirdly, if data privacy is a primary concern, it may be advisable to use open-source LLMs instead of closed-source models. Open-source models provide greater transparency and control over data handling, reducing the risks associated with data privacy.

Generative AI presents incredibly promising opportunities for application development across various domains, including industry, healthcare, legal and even accelerating scientific research. Successful integration of generative AI and LLMs requires careful consideration and understanding of application content, data and models. For real-world applications, concerns like hallucination and data privacy need to be carefully addressed. By thoroughly understanding these aspects, we can harness the full potential of generative AI and LLMs to create powerful, transformative solutions, push boundaries in AI and accelerate innovation.


References:

Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A., ... & Jumper, J. M. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 1-3.

Avetisyan, A., Xie, C., Howard-Jenkins, H., Yang, T. Y., Aroudj, S., Patra, S., ... & Balntas, V. (2024). SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model. arXiv preprint arXiv:2403.13064.

Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., ... & Ganapathy, R. (2024). The llama 3 herd of models. arXiv preprint arXiv:2407.21783.


Author

Contact Information: Liang Nanying (Dr)
School of Information Technology
Nanyang Polytechnic
E-mail: [email protected]

Dr Liang Nanying is a Lecturer at Nanyang Polytechnic’s School of Information Technology. Dr Liang has more than 15 years of experience in Machine Learning and Artificial Intelligence, both in research and industry applications. Before joining Nanyang Polytechnic, she was a Senior Research Fellow at Subanan Jurong - Nanyang Technological University Joint Corporate Lab, focusing on applications of AI and robotics for Building Information Modelling.